AITopics | neural network function approximation

5c7c66dfc9f93f0c738947f3b1c13832-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 08:47:42 GMT

data mining, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Industry:

Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

Neural Information Processing SystemsFeb-11-2026, 14:47:15 GMT

Specifically,becauseaQfunctionis defined with respect toaparticular policy,constructingPˆQ requires selection ofareference policy or distribution over policies.

etal, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Neural Information Processing SystemsDec-25-2025, 12:27:58 GMT

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function approximation further requires us to characterize the approximation error. Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$ convergence rate, where $k$ is the number of iterations.

finite-time analysis, neural-q-whittle, restless multi-armed bandit, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.77)

Add feedback

5c7c66dfc9f93f0c738947f3b1c13832-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 18:28:59 GMT

data mining, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Industry:

Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

Neural Information Processing SystemsOct-2-2025, 06:36:33 GMT

Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reviews: Learning nonlinear level sets for dimensionality reduction in function approximation

Neural Information Processing SystemsJan-23-2025, 08:52:03 GMT

In particular, the additional experiment on optimizing the dimensionality reduced functions for the real-world example looks quite persuasive, and the explanation about adding a dummy variable to address odd dimensional functions is also super valid. I also appreciate the authors for providing the detailed content of the modified paragraphs that they will include for the mathematical examples. The only small remaining issue is that for my point 6, the authors didn't seem to understand that the issue with Section 4.1 is that some of the sample points in the validation set may (almost) coincide with those in the training set, and the authors should make sure that they have excluded points that are sufficiently closed to the training set ones when generating the validation set, and clearly state this in the main text. That being said, I have decided to improve my score to 7 to acknowledge the sufficient improvement shown in the rebuttal. This paper considers the problem of dimensionality reduction for high dimensional function approximation with small data.

approximation, dimensionality reduction, function approximation, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Add feedback

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Neural Information Processing SystemsJan-18-2025, 15:28:43 GMT

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network.

neural network function approximation, neural-q-whittle, restless multi-armed bandit, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.70)

Add feedback

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Xiong, Guojun, Li, Jian

arXiv.org Artificial IntelligenceOct-3-2023

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function approximation further requires us to characterize the approximation error. Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$ convergence rate, where $k$ is the number of iterations.

approximation, neural-q-whittle, whittle index, (15 more...)

arXiv.org Artificial Intelligence

2310.02147

Country:

Oceania > New Zealand (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Nguyen-Tang, Thanh, Gupta, Sunil, Nguyen, A. Tuan, Venkatesh, Svetha

arXiv.org Artificial IntelligenceNov-26-2021

Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in function approximation settings remain under-developed. In this paper, we consider this problem on the axes of distributional shift, optimization, and generalization in offline contextual bandits with neural networks. In particular, we propose a provably efficient offline contextual bandit with neural network function approximation that does not require any functional assumption on the reward. We show that our method provably generalizes over unseen contexts under a milder condition for distributional shift than the existing OPL works. Notably, unlike any other OPL method, our method learns from the offline data in an online manner using stochastic gradient descent, allowing us to leverage the benefits of online learning into an offline setting. Moreover, we show that our method is more computationally efficient and has a better dependence on the effective dimension of the neural network than an online counterpart. Finally, we demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems.

algorithm, inequality, neural network, (13 more...)

arXiv.org Artificial Intelligence

2111.13807

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Filters

Collaborating Authors

neural network function approximation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

5c7c66dfc9f93f0c738947f3b1c13832-Paper-Conference.pdf

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

5c7c66dfc9f93f0c738947f3b1c13832-Paper-Conference.pdf

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Reviews: Learning nonlinear level sets for dimensionality reduction in function approximation

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization